Automated synchronization of a supplemental audio track with playback of a primary audiovisual presentation

ABSTRACT

Embodiments of the invention provide a method, system and computer program product for the synchronized playback of supplemental audio with the playback of a primary audiovisual presentation. The method includes acquiring in memory of a mobile device, an audio signal included as part of a playback of a primary audiovisual presentation external to the mobile device. Thereafter, a portion of the signal is speech recognized to produce a speech recognized sequence of words to be then compared to different sequences of words in a database correlating different word sequences with respectively different locations of an audio track. Consequently, a matching sequence of words in the database can be identified and also a corresponding location in the audio track. Finally, playback of the audio track can commence in the mobile device at the corresponding location.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application is a Continuation-in-Part of U.S. application Ser. No. 14/501,625, filed on Sep. 30, 2014, presently pending, the entire teachings of which are incorporated herein by reference.

BACKGROUND OF THE INVENTION

1. Field of the Invention

The present invention relates to audio playback in a mobile device and more particularly to audio playback in a mobile device of audio that is supplemental to audio present as part of an audiovisual presentation.

2. Description of the Related Art

Video playback refers to the presentation on a display substrate of previously recorded video imagery. Historically, video playback included merely the projection of a multiplicity of frames stored in a pancake of film onto screen—typically fabric. Audio playback simultaneously occurred with the playback of video imagery in a coordinated fashion based upon the transduction of optical patterns imprinted upon the film in association with one or more frames of imagery also imprinted upon the film. Thus, the coordination of playback of both audio and video remained the responsibility of a single projection device in the context of traditional film projection.

Unlike motion pictures, in the scholastic environment and even in the context of modern visual presentations, visual playback of a series of images such as a slide show occur separately from the playback of accompanying audio. In this regard, it is customary for the presenter to initiate playback, and in response to a particular cue, such as the presentation of a slide that states, “press play now”, the presenter can manually initiate playback of an audio cassette to audibly supplement the presentation of a series of slides in the slide show. However, the necessity of precision in coordinating the playback of the audio cassette with the presentation of different slides is lacking in that each slide of the slide show may remain visible on a presentation screen for an extended duration.

Coordinating the playback of audio separately from the projection of a film in a movie theater is not a problem of present consideration because modern file projectors manage both audio and video playback. Likewise, coordinating the playback of video in a television from a fixed source such as a digital versatile disk (DVD) or hard disk drive or other static memory, or from a dynamic source such as streaming media over the Internet, is of no consequence since the audio portion of the presentation is present along with the video portion. However, circumstances arise where external audio may be desired in supplement to or in replacement of the audio inherently present during the projection of a film or playback or streaming of an audiovisual presentation.

For example, for an audience member who comprehends a language other than the language of a presented film and other audience members, it is desirable to simulcast audio of a language native to the audience member in lieu of the audio of the presented film that differs from the language of the audience member. Yet, coordinating the synchronized playback of the supplemental audio with the playback of the video without the cooperation of the projectionist of the film can be a manually intensive process of timing the initiation of the playback of the supplemental audio in respect to a particular cue of the film Likewise, coordinating the synchronized playback of supplemental audio with the presentation of an audiovisual work in a television can be technically challenging absent inherent accommodations with respect to the audiovisual work or the player presenting the audiovisual work.

BRIEF SUMMARY OF THE INVENTION

Embodiments of the present invention address deficiencies of the art in respect to audio playback in a mobile device in synchronization with audio of a primary audiovisual presentation, and provide a novel and non-obvious method, system and computer program product for the synchronized playback of supplemental audio with the playback of a primary audiovisual presentation. In an embodiment of the invention, a method for synchronizing playback in a mobile device of audio supplemental to a primary audiovisual presentation, with playback of the primary audiovisual presentation in an external device is provided. The method includes acquiring in memory of a mobile device, an audio signal included as part of a playback of a primary audiovisual presentation external to the mobile device. In this regard, the audio signal can be acquired through a transducing microphone of the mobile device, or the audio signal can be received wirelessly and inaudibly.

Thereafter, a portion of the audio signal is speech recognized to produce a speech recognized sequence of words. The sequence of words is then compared to different sequences of words in a database correlating different word sequences with respectively different locations of an audio track supplemental to audio of the primary audiovisual presentation. Consequently, a matching sequence of words in the database can be identified for the speech recognized sequence of words and also a corresponding location in the audio track. Finally, playback of the audio track can commence in the mobile device at the corresponding location.

In one aspect of the embodiment, the playback of a primary audiovisual presentation external to the mobile device occurs in a movie theater. In another aspect of the embodiment, playback of a primary audiovisual presentation external to the mobile device occurs in a television streaming media from over a computer communications network. In either circumstance, periodically the acquiring, speech recognizing and comparing steps are repeated so as to identify a new matching sequence of words in the database for a newly speech recognized sequence of words and also a new corresponding location in the audio track. As such, playback of the audio track is re-synchronized in the mobile device commencing at the new corresponding location.

In another embodiment of the invention, a mobile device data processing system is configured for synchronizing playback in the mobile device of audio supplemental to a primary audiovisual presentation, with playback of the primary audiovisual presentation in an external device. The system includes a mobile computing device that has at least one processor, memory, cellular telephony circuitry and a display. The system further includes an audio synchronization module executing in the memory of the mobile computing device. The module includes program code enabled to acquire in the memory of the mobile computing device, an audio signal included as part of a playback of a primary audiovisual presentation external to the mobile computing device, to direct speech recognition of a portion of the audio signal to produce a speech recognized sequence of words, to compare the sequence of words to different sequences of words in a database correlating different word sequences with respectively different locations of an audio track supplemental to audio of the primary audiovisual presentation, to identify a matching sequence of words in the database for the speech recognized sequence of words and also a corresponding location in the audio track, and to playback the audio track in the mobile computing device commencing at the corresponding location.

Additional aspects of the invention will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The aspects of the invention will be realized and attained by means of the elements and combinations particularly pointed out in the appended claims. It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention, as claimed.

BRIEF DESCRIPTION OF THE SEVERAL VIEWS OF THE DRAWINGS

The accompanying drawings, which are incorporated in and constitute part of this specification, illustrate embodiments of the invention and together with the description, serve to explain the principles of the invention. The embodiments illustrated herein are presently preferred, it being understood, however, that the invention is not limited to the precise arrangements and instrumentalities shown, wherein:

FIG. 1 is a pictorial illustration of a process for synchronizing playback in a mobile device of audio supplemental to a primary audiovisual presentation, with playback of the primary audiovisual presentation in an external device;

FIG. 2 is a schematic illustration of a mobile device data processing system configured for synchronizing playback of audio supplemental to a primary audiovisual presentation, with playback of the primary audiovisual presentation in an external device; and,

FIG. 3 is a flow chart illustrating a process for synchronizing playback in a mobile device of audio supplemental to a primary audiovisual presentation, with playback of the primary audiovisual presentation in an external device.

DETAILED DESCRIPTION OF THE INVENTION

Embodiments of the invention provide for the synchronization in a mobile device of playback of audio supplemental to a primary audiovisual presentation, with the playback of the primary audiovisual presentation. In accordance with an embodiment of the invention, a supplemental audio track corresponding to a primary audiovisual presentation can be selected in fixed storage of a mobile device. In response to a request to synchronize playback of the supplemental audio track with playback of the primary audiovisual presentation, audio from the primary audiovisual presentation can be speech recognized to produce a fingerprint of words. The fingerprint of words can be compared to a pattern of words present in an index of the primary audiovisual presentation. Upon matching the fingerprint of words to a pattern of words in the index, a corresponding location in the supplemental audio track can be identified and the mobile device can be directed to playback the audio track commencing at the identified location. Periodically or on-demand, a new fingerprint of words can be acquired by the mobile device during playback of the primary audiovisual presentation so as to identify a new location in the audio track. Thereafter, the playback of the audio track can be re-synchronized commencing at the new location.

In further illustration, FIG. 1 pictorially shows a process for synchronizing playback in a mobile device of audio supplemental to a primary audiovisual presentation, with playback of the primary audiovisual presentation in an external device. As shown in FIG. 1, a primary audiovisual presentation 120 can be played back in the theater setting by a film projector 110A, or in a home or other private setting by way of an Internet based media streaming service 110B, by way of a cable television or satellite television based media broadcasting through a set top box 110C, or by way of a media player 110D playing one or more types of storage media. Audio included as part of the primary audiovisual presentation 120 can be transmitted to a mobile device 140 within an audio signal 130. In this regard, the audio signal 130 can be audible in nature so as to be received in the mobile device by way of a transducing microphone, or inaudible in nature so as to be captured wireless in a wireless receiver such as a Bluetooth receiver, a “Wi-Fi” 802.11x receiver, an infrared received or other such functional equivalent.

Audio synchronization logic 150 executing within the mobile device 140 can process the audio signal 130 by directing the performance of speech recognition upon the audio signal 130 so as to produce a fingerprint consisting of a sequence of speech recognized words 170. The sequence of speech recognized words 170 then can be compared by the audio synchronization logic 150 to words within a sequence index 180 for a supplemental audio track 160 supplementing the audio of the primary audiovisual presentation 120. Responsive to identifying a match in the sequence index 180, a corresponding audio track location 190 can be identified indicative of a location within the stored audio track 160. Consequently, the audio synchronization logic 160 can direct playback of the stored audio track 160 in the mobile device commencing at the audio track location 190.

The process described in connection with FIG. 1 can be implemented within a mobile device data processing system. In further illustration, FIG. 2 schematically shows a mobile device data processing system configured for synchronizing playback of audio supplemental to a primary audiovisual presentation, with playback of the primary audiovisual presentation in an external device. The system can include a mobile device 200, for instance a smart phone, tablet computer or personal digital assistant. The mobile device 200 can include at least one processor 220 and memory 230. The mobile device 200 additionally can include cellular communications circuitry 210 arranged to support cellular communications in the mobile device 200, as well as data communications circuitry 240 arranged to support data communications.

An operating system 260 can execute in the memory 230 by the processor 220 of the mobile device 200 and can support the operation of a number of computer programs, including a camera recorder 235 and a voice recorder 245. Further, a display management program 255 can operate through the operating system 260 as can an audio management program 265. Of note, an audio synchronization module 300 can be hosted by the operating system 260. The audio synchronization module 300 can include program code that, when executed in the memory 230 by the operating system 260, can act to synchronize the playback through audio output circuitry 270 of a selected audio track 215 in data store 250 of the mobile device 200 in supplement to an externally played back audiovisual presentation.

In this regard, the program code of the audio synchronization module 300 is enabled to detect external audio in an audio signal provided by the playback of audiovisual presentation through microphone 280, or in the alternative, as provided inaudibly and directly by way of a wireless connection established between a set top box, television or media player and antenna 290 through data communications circuitry 240. Speech recognition engine 225 also executing in the memory 230 by the operating system 260 can process the audio signal to produce speech recognized words in a sequence.

Thereafter, the program code can compare the speech recognized sequence of words to a known sequence of words in an index 275 for the audiovisual presentation so as to locate a matching sequence of words. In this regard, the index 275 can include a set of the words of a transcript of the audiovisual presentation, either literally, or organized by unique instances of words. In the latter case, each word instance can be stored in a data structure including zero or more references to other words in the index which follow the word, thereby forming different indexed word sequences. Each unique word sequence additionally can include in the index 275 a corresponding location in the selected audio track 215. In this way, locating a matching sequence can be a matter of successfully traversing a tree of words in the index 275 so as to determine a corresponding position of the audio track 215 to play back in synchronization with the audio signal.

Periodically, the program code of the audio synchronization module 300 can detect contemporaneously broadcast external audio provided by the playback of the audiovisual presentation. The detected audio within an audio signal again can be speech recognized by the speech recognition engine 225 so as to produce text for comparison with a sequence of words in the index 275. Based upon the matching of the speech recognized text to text in the index, the program code of the audio synchronization module 300 is able to precisely locate a contemporaneous position of the detected audio signal so as to coordinate the precise location in the audio track 215 to be played back through the audio output circuitry 270.

In even yet further illustration of the operation of the audio synchronization module 300, FIG. 3 is a flow chart illustrating a process for synchronizing playback in a mobile device of audio supplemental to a primary audiovisual presentation, with playback of the primary audiovisual presentation in an external device. Beginning in block 310, a synchronization request can be received in respect to a motion picture played back either publicly in a movie theater setting, or privately on a television screen or monitor. In block 320, an audio track associated with the motion picture can be selected and a location index for the audio track can be loaded into memory at block 330. In block 340, an audio signal can be acquired for the motion picture and in block 350, the audio signal can be speech recognized to produce a sequence of words.

In block 360, the sequence of words produced by the speech recognition can be retrieved for processing and compared to different sequences of words in block 370 within the loaded location index. In decision block 380, it can be determined if a match is located. If not, additional portions of the audio signal can be speech recognized in block 350 and the process can repeat through block 360. However, if in decision block 380 a match is located in the location index, in block 390 a location in the audio track corresponding to the matching sequence of words can be identified and in block 400, playback of the audio track can be directed commencing from the identified location. Thereafter, a delay can be incurred in block 410 and the process can repeat anew at block 340.

The present invention may be embodied within a system, a method, a computer program product or any combination thereof. The computer program product may include a computer readable storage medium or media having computer readable program instructions thereon for causing a processor to carry out aspects of the present invention. The computer readable storage medium can be a tangible device that can retain and store instructions for use by an instruction execution device. The computer readable storage medium may be, for example, but is not limited to, an electronic storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing.

A non-exhaustive list of more specific examples of the computer readable storage medium includes the following: a portable computer diskette, a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or Flash memory), a static random access memory (SRAM), a portable compact disc read-only memory (CD-ROM), a digital versatile disk (DVD), a memory stick, a floppy disk, a mechanically encoded device such as punch-cards or raised structures in a groove having instructions recorded thereon, and any suitable combination of the foregoing. A computer readable storage medium, as used herein, is not to be construed as being transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through a waveguide or other transmission media (e.g., light pulses passing through a fiber-optic cable), or electrical signals transmitted through a wire.

Computer readable program instructions described herein can be downloaded to respective computing/processing devices from a computer readable storage medium or to an external computer or external storage device via a network, for example, the Internet, a local area network, a wide area network and/or a wireless network. The network may comprise copper transmission cables, optical transmission fibers, wireless transmission, routers, firewalls, switches, gateway computers and/or edge servers. A network adapter card or network interface in each computing/processing device receives computer readable program instructions from the network and forwards the computer readable program instructions for storage in a computer readable storage medium within the respective computing/processing device.

Computer readable program instructions for carrying out operations of the present invention may be assembler instructions, instruction-set-architecture (ISA) instructions, machine instructions, machine dependent instructions, microcode, firmware instructions, state-setting data, or either source code or object code written in any combination of one or more programming languages, including an object oriented programming language such as Smalltalk, C++ or the like, and conventional procedural programming languages, such as the “C” programming language or similar programming languages. The computer readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the latter scenario, the remote computer may be connected to the user's computer through any type of network, including a local area network (LAN) or a wide area network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet Service Provider). In some embodiments, electronic circuitry including, for example, programmable logic circuitry, field-programmable gate arrays (FPGA), or programmable logic arrays (PLA) may execute the computer readable program instructions by utilizing state information of the computer readable program instructions to personalize the electronic circuitry, in order to perform aspects of the present invention.

Aspects of the present invention are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer readable program instructions.

These computer readable program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks. These computer readable program instructions may also be stored in a computer readable storage medium that can direct a computer, a programmable data processing apparatus, and/or other devices to function in a particular manner, such that the computer readable storage medium having instructions stored therein comprises an article of manufacture including instructions which implement aspects of the function/act specified in the flowchart and/or block diagram block or blocks.

The computer readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other device to cause a series of operational steps to be performed on the computer, other programmable apparatus or other device to produce a computer implemented process, such that the instructions which execute on the computer, other programmable apparatus, or other device implement the functions/acts specified in the flowchart and/or block diagram block or blocks.

The flowchart and block diagrams in the Figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods, and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems that perform the specified functions or acts or carry out combinations of special purpose hardware and computer instructions.

Finally, the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used herein, the singular forms “a”, “an” and “the” are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms “comprises” and/or “comprising,” when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

The corresponding structures, materials, acts, and equivalents of all means or step plus function elements in the claims below are intended to include any structure, material, or act for performing the function in combination with other claimed elements as specifically claimed. The description of the present invention has been presented for purposes of illustration and description, but is not intended to be exhaustive or limited to the invention in the form disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the invention. The embodiment was chosen and described in order to best explain the principles of the invention and the practical application, and to enable others of ordinary skill in the art to understand the invention for various embodiments with various modifications as are suited to the particular use contemplated.

Having thus described the invention of the present application in detail and by reference to embodiments thereof, it will be apparent that modifications and variations are possible without departing from the scope of the invention defined in the appended claims as follows: 

We claim:
 1. A method for synchronizing playback in a mobile device of audio supplemental to a primary audiovisual presentation, with playback of the primary audiovisual presentation in an external device, the method comprising: acquiring in memory of a mobile device, an audio signal included as part of a playback of a primary audiovisual presentation external to the mobile device; speech recognizing a portion of the audio signal to produce a speech recognized sequence of words; comparing the sequence of words to different sequences of words in a database correlating different word sequences with respectively different locations of an audio track supplemental to audio of the primary audiovisual presentation; identifying a matching sequence of words in the database for the speech recognized sequence of words and also a corresponding location in the audio track; and, playing back the audio track in the mobile device commencing at the corresponding location.
 2. The method of claim 1, wherein the audio signal is acquired through a transducing microphone of the mobile device.
 3. The method of claim 1, wherein the audio signal is received wirelessly and inaudibly.
 4. The method of claim 1, wherein the playback of a primary audiovisual presentation external to the mobile device occurs in a movie theater.
 5. The method of claim 1, wherein the playback of a primary audiovisual presentation external to the mobile device occurs in a television streaming media from over a computer communications network.
 6. The method of claim 1, further comprising: periodically repeating the acquiring, speech recognizing and comparing to identify a new matching sequence of words in the database for a newly speech recognized sequence of words and also a new corresponding location in the audio track; and, re-synchronizing playback of the audio track in the mobile device commencing at the new corresponding location.
 7. A mobile device data processing system configured for synchronizing playback in the mobile device of audio supplemental to a primary audiovisual presentation, with playback of the primary audiovisual presentation in an external device, the system comprising: a mobile computing device comprising at least one processor, memory, cellular telephony circuitry and a display; and, an audio synchronization module executing in the memory of the mobile computing device, the module comprising program code enabled to acquire in the memory of the mobile computing device, an audio signal included as part of a playback of a primary audiovisual presentation external to the mobile computing device, to direct speech recognition of a portion of the audio signal to produce a speech recognized sequence of words, to compare the sequence of words to different sequences of words in a database correlating different word sequences with respectively different locations of an audio track supplemental to audio of the primary audiovisual presentation, to identify a matching sequence of words in the database for the speech recognized sequence of words and also a corresponding location in the audio track, and to playback the audio track in the mobile computing device commencing at the corresponding location.
 8. The system of claim 7, wherein the audio signal is acquired through a transducing microphone of the mobile computing device.
 9. The system of claim 7, wherein the audio signal is received wirelessly and inaudibly utilizing data communications circuitry present in the mobile computing device.
 10. The system of claim 8, wherein the playback of a primary audiovisual presentation external to the mobile computing device occurs in a movie theater.
 11. The system of claim 9, wherein the playback of a primary audiovisual presentation external to the mobile computing device occurs in a television streaming media from over a computer communications network.
 12. The system of claim 7, wherein the program code is further enabled to: periodically repeat the acquiring, speech recognizing and comparing to identify a new matching sequence of words in the database for a newly speech recognized sequence of words and also a new corresponding location in the audio track; and, re-synchronize playback of the audio track in the mobile device commencing at the new corresponding location.
 13. A computer program product for synchronizing playback in a mobile device of audio supplemental to a primary audiovisual presentation, with playback of the primary audiovisual presentation in an external device, the computer program product comprising a computer readable storage medium having program instructions embodied therewith, the program instructions executable by a device to cause the device to perform a method comprising: acquiring in memory of a mobile device, an audio signal included as part of a playback of a primary audiovisual presentation external to the mobile device; speech recognizing a portion of the audio signal to produce a speech recognized sequence of words; comparing the sequence of words to different sequences of words in a database correlating different word sequences with respectively different locations of an audio track supplemental to audio of the primary audiovisual presentation; identifying a matching sequence of words in the database for the speech recognized sequence of words and also a corresponding location in the audio track; and, playing back the audio track in the mobile device commencing at the corresponding location.
 14. The computer program product of claim 13, wherein the audio signal is acquired through a transducing microphone of the mobile device.
 15. The computer program product of claim 13, wherein the audio signal is received wireles sly and inaudibly.
 16. The computer program product of claim 13, wherein the playback of a primary audiovisual presentation external to the mobile device occurs in a movie theater.
 17. The computer program product of claim 13, wherein the playback of a primary audiovisual presentation external to the mobile device occurs in a television streaming media from over a computer communications network.
 18. The computer program product of claim 13, wherein the method further comprises: periodically repeating the acquiring, speech recognizing and comparing to identify a new matching sequence of words in the database for a newly speech recognized sequence of words and also a new corresponding location in the audio track; and, re-synchronizing playback of the audio track in the mobile device commencing at the new corresponding location. 